19 research outputs found

    Targeting a Practical Approach for Robot Vision with Ensembles of Visual Features

    Get PDF
    We approach the task of topological localization in mobile robotics without using a temporal continuity of the sequences of images. The provided information about the environment is contained in images taken with a perspective colour camera mounted on a robot platform. The main contributions of this work are quantifiable examinations of a wide variety of different global and local invariant features, and different distance measures. We focus on finding the optimal set of features and a deepened analysis was carried out. The characteristics of different features were analysed using widely known dissimilarity measures and graphical views of the overall performances. The quality of the acquired configurations is also tested in the localization stage by means of location recognition in the Robot Vision task, by participating at the ImageCLEF International Evaluation Campaign. The long term goal of this project is to develop integrated, stand alone capabilities for real-time topological localization in varying illumination conditions and over longer routes

    EMBEDDIA at SemEval-2022 Task 8: Investigating Sentence, Image, and Knowledge Graph Representations for Multilingual News Article Similarity

    Get PDF
    In this paper, we present the participation of the EMBEDDIA team in the SemEval-2022 Task 8 (Multilingual News Article Similarity). We cover several techniques and propose different methods for finding the multilingual news article similarity by exploring the dataset in its entirety. We take advantage of the textual content of the articles, the provided metadata (e.g., titles, keywords, topics), the translated articles, the images (those that were available), and knowledge graph-based representations for entities and relations present in the articles. We, then, compute the semantic similarity between the different features and predict through regression the similarity scores. Our findings show that, while our proposed methods obtained promising results, exploiting the semantic textual similarity with sentence representations is unbeatable. Finally, in the official SemEval-2022 Task 8, we ranked fifth in the overall team ranking cross-lingual results, and second in the English-only results.Peer reviewe

    L'importance des entités pour la tùche de détection d'événements en tant que systÚme de question-réponse

    No full text
    PrĂ©sentation oraleNational audienceDans cet article, nous abordons un paradigme rĂ©cent et peu Ă©tudiĂ© pour la tĂąche de dĂ©tection d’évĂ©nements en la prĂ©sentant comme un problĂšme de question-rĂ©ponse avec possibilitĂ© de rĂ©ponses multiples et le support d’entitĂ©s. La tĂąche d’extraction des dĂ©clencheurs d’évĂ©nements est ainsi transformĂ©e en une tĂąche d’identification des intervalles de rĂ©ponse Ă  partir d’un contexte, tout en se concentrant Ă©galement sur les entitĂ©s environnantes. L’architecture est basĂ©e sur un modĂšle de langage prĂ©-entraĂźnĂ© et finement ajustĂ©, oĂč le contexte d’entrĂ©e est augmentĂ© d’entitĂ©s marquĂ©es Ă  diffĂ©rents niveaux, de leurs positions, de leurs types et, enfin, de leurs rĂŽles d’arguments. Nos expĂ©riences sur le corpus ACE 2005 dĂ©montrent que le modĂšle proposĂ© exploite correctement les informations sur les entitĂ©s dans le cadre de la dĂ©tection des Ă©vĂ©nements et qu’il constitue une solution viable pour cette tĂąche. De plus, nous dĂ©montrons que notre mĂ©thode, avec diffĂ©rents marqueurs d’entitĂ©s, est particuliĂšrement capable d’extraire des types d’évĂ©nements non vus dans des contextes d’apprentissage en peu de coups

    Exploring Entities in Event Detection as Question Answering

    No full text
    International audienceIn this paper, we approach a recent and under-researched paradigm for the task of event detection (ED) by casting it as a questionanswering (QA) problem with the possibility of multiple answers and the support of entities. The extraction of event triggers is, thus, transformed into the task of identifying answer spans from a context, while also focusing on the surrounding entities. The architecture is based on a pre-trained and fine-tuned language model, where the input context is augmented with entities marked at different levels, their positions, their types, and, finally, their argument roles. Experiments on the ACE 2005 corpus demonstrate that the proposed model properly leverages entity information in detecting events and that it is a viable solution for the ED task. Moreover, we demonstrate that our method with different entity markers is particularly able to extract unseen event types in few-shot learning settings

    The importance of character-level information in an event detection model

    No full text
    26th International Conference on Applications of Natural Language to Information Systems, NLDB 2021, SaarbrĂŒcken, Germany, June 23–25, 2021, ProceedingsInternational audienceThis paper tackles the task of event detection that aims at identifying and categorizing event mentions in texts. One of the difficulties of this task is the problem of event mentions corresponding to misspelled, custom, or out-of-vocabulary words. To analyze the impact of character-level features, we propose to integrate character embeddings, that can capture morphological and shape information about words, to a convolutional model for event detection. More precisely, we evaluate two strategies for performing such integration and show that a late fusion approach outperforms both an early fusion approach and models integrating character or subword information such as ELMo or BERT

    Dataset and Models for Detection of News Agency Releases in Historical Newspapers

    No full text
    This record contains the annotated datasets and models used and produced for the work reported in the Master Thesis "Where Did the News come from? Detection of News Agency Releases in Historical Newspapers " (link). Please cite this report if you are using the models/datasets or find it relevant to your research: @article{Marxen:305129, title = {Where Did the News Come From? Detection of News Agency Releases in Historical Newspapers}, author = {Marxen, Lea}, pages = {114p}, year = {2023}, url = {http://infoscience.epfl.ch/record/305129}, } 1. DATA The newsagency-dataset contains historical newspaper articles with annotations of news agency mentions. The articles are divided into French (fr) and German (de) subsets and a train, dev and test set respectively. The data is annotated at token-level in the CoNLL format with IOB tagging format. The distribution of articles in the different sets is as follows: Dataset Statistics Lg. Docs Agency Mentions Train de 333 493 fr 903 1,122 Dev de 32 26 fr 110 114 Test de 32 58 fr 120 163 Due to an error, there are seven duplicated articles in the French test set (article IDs: courriergdl-1847-10-02-a-i0002, courriergdl-1852-02-14-a-i0002, courriergdl-1860-10-31-a-i0016, courriergdl-1864-12-15-a-i0005, lunion-1860-11-27-a-i0004, lunion-1865-02-05-a-i0012, lunion-1866-02-16-a-i0009). 2. MODELS The two agency detection and classification models used for the inference on the impresso Corpus are released as well: newsagency-model-de: based on German BERT (with maximum sequence length 128), fine-tuned with the German training set of the newsagency-dataset newsagency-model-fr: based on French Europeana BERT (with maximum sequence length 128), fine-tuned with the French training set of the newsagency-dataset The models perform multitask classification with two prediction heads, one for token-level agency entity classification and one for sentence-level (has_agency: yes/no). They can be run with TorchServe, for details see the newsagency-classification repository. Please refer to the report for further information or contact us. 3. CODE https://github.com/impresso/newsagency-classification 4. CONTACT Maud Ehrmann (EPFL-DHLAB) Emanuela Boros (EPFL-DHLAB

    IntĂ©rĂȘt des modĂšles de caractĂšres pour la dĂ©tection d'Ă©vĂ©nements

    No full text
    International audienceCet article aborde la tĂąche de dĂ©tection d’évĂ©nements, visant Ă  identifier et catĂ©goriser les mentions d’évĂ©nements dans les textes. Une des difficultĂ©s de cette tĂąche est le problĂšme des mentions d’évĂ©nements correspondant Ă  des mots mal orthographiĂ©s, trĂšs spĂ©cifiques ou hors vocabulaire. Pour analyser l’impact de leur prise en compte par le biais de modĂšles de caractĂšres, nous proposons d’intĂ©grer des plongements de caractĂšres, qui peuvent capturer des informations morphologiques et de forme sur les mots, Ă  un modĂšle convolutif pour la dĂ©tection d’évĂ©nements. Plus prĂ©cisĂ©ment, nous Ă©valuons deux stratĂ©gies pour rĂ©aliser une telle intĂ©gration et montrons qu’une approche de fusion tardive surpasse Ă  la fois une approche de fusion prĂ©coce et des modĂšles intĂ©grant des informations sur les caractĂšres ou les sous-mots tels que ELMo ou BERT

    Multilingual Epidemiological Text Classification: A Comparative Study

    No full text
    International audienceIn this paper, we approach the multilingual text classification task in the context of the epidemiological field. Multilingual text classification models tend to perform differently across different languages (low-or high-resource), more particularly when the dataset is highly imbalanced, which is the case for epidemiological datasets. We conduct a comparative study of different machine and deep learning text classification models using a dataset comprising news articles related to epidemic outbreaks from six languages, four low-resourced and two high-resourced, in order to analyze the influence of the nature of the language, the structure of the document, and the size of the data. Our findings indicate that the performance of the models based on fine-tuned language models exceeds by more than 50% the chosen baseline models that include a specialized epidemiological news surveillance system and several machine learning models. Also, low-resource languages are highly influenced not only by the typology of the languages on which the models have been pre-trained or/and fine-tuned but also by their size. Furthermore, we discover that the beginning and the end of documents provide the most salient features for this task and, as expected, the performance of the models was proportionate to the training data size

    Étude comparative de mĂ©thodes de classification multilingue appliquĂ©es Ă  l'Ă©pidĂ©miologie

    No full text
    Dans cet article, nous abordons la tùche de classification multilingue de textes dans le domaine épidémiologique. Nous comparons différents modÚles d'apprentissage automatique et d'apprentissage profond à l'aide d'un jeu de données multilingue comprenant des articles de presse en six langues. Notre objectif est d'analyser l'influence de la famille de langue, de la structure du document et de la taille des données sur les résultats de classification. Nos résultats indiquent que les performances des modÚles basés sur des modÚles linguistiques dépassent de plus de 50% les baselines, parmi lesquelles un systÚme spécialisé de surveillance épidémiologique et plusieurs modÚles d'apprentissage automatique
    corecore